Portable Compilation of Vector Expressions for Architectures with Memory Hierarchy
نویسندگان
چکیده
The paper presents a scheme of code generation for vector expressions implemented in the CC] compiler (CC] is a vector ANSI C superset aimed at vector and superscalar architectures). The scheme is based on two well-known optimization techniques { loop invariant code motion and iteration space tiling. The problem of nding the optimal tile size for the imperfectly nested loop system implementing a vector expression is addressed. Some experimental results demonstrating eeciency of the code generation scheme are also presented.
منابع مشابه
Accelerating QDP++ using GPUs
Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of their GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domai...
متن کاملCompilation and Simulation Tool Chain for Memory Aware Energy Optimizations
Memory hierarchies are known to be the energy bottleneck of portable embedded devices. Numerous memory aware energy optimizations have been proposed. However, both the optimization and the validation is performed in an ad-hoc manner as a coherent compilation and simulation framework does not exist as yet. In this paper, we present such a framework for performing memory hierarchy aware energy op...
متن کاملCompilation techniques for parallel systems
Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and eeectively exploiting parallelism at various levels of granularity. We begin by describing the program analysis techniques through w...
متن کاملTwo Fundamental Limits on Dataflow Multiprocessing
This paper examines the argument for dataflow architectures in “Two Fundamental Issues in Multiprocessing[5].” We observe two key problems. First, the justification of extensive multithreading is based on an overly simplistic view of the storage hierarchy. Second, the local greedy scheduling policy embodied in dataflow is inadequate in many circumstances. A more realistic model of the storage h...
متن کاملGPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs
Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...
متن کامل